EditorKit과 HTTP HEAD Request를 이용한 유효한 URL 링크 수집 :: 자바네트워크I/O[SSISO Community]
 
SSISO 카페 SSISO Source SSISO 구직 SSISO 쇼핑몰 SSISO 맛집
추천검색어 : JUnit   Log4j   ajax   spring   struts   struts-config.xml   Synchronized   책정보   Ajax 마스터하기   우측부분

자바네트워크I/O
[1]
등록일:2008-03-10 01:05:43 (0%)
작성자:
제목:EditorKit과 HTTP HEAD Request를 이용한 유효한 URL 링크 수집
EditorKit과  HTTP  HEAD  Request를  이용하여  HTML  페이지로부터  유효한  URL  링크  뽑아내기  

editor  키를  이용하여  html  페이지의  A  태그의  href  속성의  URL  링크들을  얻어내고  그  링크를  HTTP  HEAD  Request를  보내서  해당  링크가  유효한지를  검사합니다.  

응답  코드(RFC  2616)
1xx  :  Informational
2xx  :  Successful
3xx  :  Redirection
4xx  :  Error
5xx  :  Server  Error  

소스  코드  :  

import  java.io.IOException;
import  java.io.InputStreamReader;
import  java.io.Reader;
import  java.net.HttpURLConnection;
import  java.net.URL;
import  java.net.URLConnection;  

import  javax.swing.text.Document;
import  javax.swing.text.EditorKit;
import  javax.swing.text.ElementIterator;
import  javax.swing.text.SimpleAttributeSet;
import  javax.swing.text.html.HTML;
import  javax.swing.text.html.HTMLEditorKit;  

public  class  ExtractURLFromHTMLPage
{
      public  static  void  main(String[]  args)
      {
            HttpURLConnection.setFollowRedirects(false);
            EditorKit  kit  =  new  HTMLEditorKit();
            Document  doc  =  kit.createDefaultDocument();  

            //  The  Document  class  does  not  yet  handle

//  charset's  properly.

            doc.putProperty("IgnoreCharsetDirective",  Boolean.TRUE);  

            try
            {

                  //  Create  a  reader  on  the  HTML  content.
                  URL  url_  =  new  URL("http://www.google.co.kr");
                  URLConnection  conn  =  url_.openConnection();
                  Reader  rd

=  new  InputStreamReader(conn.getInputStream());  

                  //  Parse  the  HTML.
                  kit.read(rd,  doc,  0);  

                  //  Iterate  through  the  elements  of  the  HTML  document.
                  ElementIterator  it  =  new  ElementIterator(doc);
                  javax.swing.text.Element  elem;

                        

                  while((elem  =  it.next())  !=  null)
                  {
                        SimpleAttributeSet  s  =  (SimpleAttributeSet)
                        elem.getAttributes().getAttribute(HTML.Tag.A);                            

                        if  (s  !=  null)
                        {
                                validateHref(
(String)s.getAttribute(HTML.Attribute.HREF));
                        }
                  }
            }

            catch  (Exception  e)
            {
                  e.printStackTrace();
            }
      }    

      /**
        *  HTTP  HEAD  Request  메소드를  이용한  URL  검증
        *
        *  HEAD  메소드를  이용하여  URL로  요청  을  보내고  응답  코드를
        *  이용하여  url  유효성을  체크한다.
        *
        *  응답  코드(RFC  2616)
        *  1xx  :  Informational
        *  2xx  :  Successful
        *  3xx  :  Redirection
        *  4xx  :  Error
        *  5xx  :  Server  Error
        *
        *  @param  urlString
        */

      private  static  void  validateHref(String  urlString)

      {

            if  ((urlString  !=  null)

&&  urlString.startsWith("http://"))

            {

                  try
                  {
                        URL  url  =  new  URL(urlString);
                        URLConnection  connection  =  url.openConnection();

                            

                        if(connection  instanceof  HttpURLConnection)
                        {

                              HttpURLConnection  httpConnection  =  (HttpURLConnection)connection;
                              httpConnection.setRequestMethod("HEAD");
                              httpConnection.connect();                                      

                              int  response  =  httpConnection.getResponseCode();
                              System.out.println("["  +  response  +  "]"  +  urlString);

                              String  location  =  httpConnection.getHeaderField("Location");                                      

                              if(location  !=  null)
                              {
                                    System.out.println("Location:  "  +  location);
                              }                                      

                              System.out.println();
                          }
                    }

                    catch  (IOException  e)
                    {
                          e.printStackTrace();
                    }

              }

      }

}

  

  

실행  결과  :  출력  형태(  [응답코드]  URL  )  

[403]http://news.google.co.kr/nwshp?hl=ko&ie=UTF-8&tab=wn  
[403]http://groups.google.co.kr/grphp?hl=ko&ie=UTF-8&tab=wg
[302]http://www.google.com/ncr
Location:  http://www.google.com/

[출처]  EditorKit과  HTTP  HEAD  Request를  이용한  유효한  URL  링크  수집|작성자  지원아빠
http://blog.naver.com/estern?Redirect=Log&logNo=110007144923
[본문링크] EditorKit과 HTTP HEAD Request를 이용한 유효한 URL 링크 수집
[1]
코멘트(이글의 트랙백 주소:/cafe/tb_receive.php?no=2446
작성자
비밀번호

 

SSISOCommunity

[이전]

Copyright byCopyright ⓒ2005, SSISO Community All Rights Reserved.